Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control the number of rows returned by SelectResult #9273

Closed
wants to merge 8 commits into from

Conversation

qw4990
Copy link
Contributor

@qw4990 qw4990 commented Feb 11, 2019

What problem does this PR solve?

Control the number of rows returned by SelectResult.
This PR is a subtask of #9166.

What is changed and how it works?

Add requiredRows into Chunk to specify how many rows the parent executor want.
And update selectResult and streamResult to support this feature.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change
  • Has interface methods change

@qw4990 qw4990 requested review from zz-jason and lysu February 11, 2019 11:42
@qw4990 qw4990 added the sig/execution SIG execution label Feb 11, 2019
@qw4990 qw4990 requested review from eurekaka and XuHuaiyu February 11, 2019 11:43
@qw4990 qw4990 changed the title Batch size control select result Control the number of rows returned by SelectResult Feb 11, 2019
@qw4990 qw4990 changed the title Control the number of rows returned by SelectResult Control the number of rows returned by SelectResult Feb 11, 2019
@qw4990 qw4990 force-pushed the batchSizeControl_SelectResult branch from 93e996d to fde0c6a Compare February 11, 2019 11:48
@qw4990 qw4990 requested a review from alivxxx February 11, 2019 11:50
@codecov-io
Copy link

codecov-io commented Feb 11, 2019

Codecov Report

Merging #9273 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9273      +/-   ##
==========================================
+ Coverage   67.22%   67.24%   +0.01%     
==========================================
  Files         371      371              
  Lines       77271    77283      +12     
==========================================
+ Hits        51949    51968      +19     
+ Misses      20682    20677       -5     
+ Partials     4640     4638       -2
Impacted Files Coverage Δ
distsql/stream.go 59.72% <100%> (+1.38%) ⬆️
util/chunk/recordbatch.go 100% <100%> (+100%) ⬆️
distsql/select_result.go 75% <100%> (+1.5%) ⬆️
executor/distsql.go 72.53% <0%> (-0.47%) ⬇️
store/tikv/lock_resolver.go 42.65% <0%> (+0.94%) ⬆️
util/systimemon/systime_mon.go 100% <0%> (+20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f9ab60...973b698. Read the comment docs.

// requiredRows indicates how many rows is considered full for parent executor.
// Child executor can return immediately if there are such number of rows,
// instead of fulling the whole chunk.
// This is not compulsory, so the number of returned rows can be larger than it in some cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

    // requiredRows indicates how many rows is required by the parent executor.
    // Child executor should stop populating rows immediately if there are at
    // least required rows in the Chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments have been updated.

}

// IsFull returns if this batch can be considered full.
func (rb *RecordBatch) IsFull(maxChunkSize int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove maxChunkSize from the interface? Only check whether rb.NumRows() >= rb.requiredRows? we should set rb.requiredRows according to max chunk size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only checking requiredRows is dangerous and not convenient.
I prefer to take both requiredRows and MaxChunkSize into account in this function.
How about introducing sessionctx.Context into RecordBatch.
Then the IsFull will behave like:

IsFull() bool {
    numRows >= ctx.GetSessionVars().MaxChunkSize || 
    (requiredRows != UnspecifiedNumRows && numRows >= requiredRows)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is requiredRows determined by maxChunkSize?

Copy link
Contributor Author

@qw4990 qw4990 Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiredRows should be equal or less than maxChunkSize, but it can be Unspecified.
When it's Unspecified, it's dangerous to always return false if it ignores maxChunkSize.

And in each place where IsFull is used, it is not convenient to check both IsFull and maxChunkSize like:

if !batch.IsFull() && batch.NumRows() < ctx.GetSessionVars().MaxChunkSize { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found there are some executors have their own maxChunkSize instead of using global maxChunkSize in sessionctx.Context, so the approach mentioned above is not appropriate.

I will remove maxChunkSize from IsFull and let the outside scope where IsFull is called check it.

}

// SetRequiredRows sets the number of rows the parent executor want.
func (rb *RecordBatch) SetRequiredRows(numRows int) *RecordBatch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allocate a RecordBatch with zero default values, rb := &RecordBatch{}, and without calling SetRequiredRows(), 0 is not treated like unspecified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnspecifiedNumRows has been updated to zero value of int.

distsql/stream.go Outdated Show resolved Hide resolved
chk.Reset()
maxChunkSize := r.ctx.GetSessionVars().MaxChunkSize
for chk.NumRows() < maxChunkSize {
return r.NextBatch(ctx, chunk.NewRecordBatch(chk))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about setting the required rows of that newly created batch to max chunk size?

}

// IsFull returns if this batch can be considered full.
func (rb *RecordBatch) IsFull(maxChunkSize int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship between the param maxChunkSize and the member variable requiredRows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxChunkSize has been removed from this function now.

// SetRequiredRows sets the number of rows the parent executor want.
func (rb *RecordBatch) SetRequiredRows(numRows int) *RecordBatch {
if numRows <= 0 {
numRows = UnspecifiedNumRows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If numRows <= 0, then set the requiredRows to ctx.MaxChunkSize?

@qw4990 qw4990 force-pushed the batchSizeControl_SelectResult branch from 037fa57 to 973b698 Compare February 12, 2019 12:54
@qw4990
Copy link
Contributor Author

qw4990 commented Feb 13, 2019

After discussing with @zz-jason and @lysu , this approach to control the number of rows has been deprecated, and the new solution is in #9293 .

@qw4990 qw4990 closed this Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants